AITopics | auto-regressive language modeling

Primer: Searching for Efficient Transformers for Language Modeling

Neural Information Processing SystemsApr-25-2026, 08:28:47 GMT

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Decoupled Context Processing for Context Augmented Language Modeling Zonglin Li

Neural Information Processing SystemsFeb-10-2026, 13:13:34 GMT

Language models can be augmented with a context retriever to incorporate knowledge from large external databases. By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better parameter efficiency, interpretability and mod-ularity.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Primer: SearchingforEfficientTransformers forLanguageModeling

Neural Information Processing SystemsFeb-8-2026, 02:37:12 GMT

Weidentify anarchitecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Searching for Efficient Transformers for Language Modeling

Neural Information Processing SystemsDec-23-2025, 22:59:01 GMT

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.

efficient transformer, name change, transformer, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.44)

Add feedback

On the Power of Decision Trees in Auto-Regressive Language Modeling

Neural Information Processing SystemsMay-27-2025, 05:12:53 GMT

Originally proposed for handling time series data, Auto-regressive Decision Trees (ARDTs) have not yet been explored for language modeling. This paper delves into both the theoretical and practical applications of ARDTs in this new context. We theoretically demonstrate that ARDTs can compute complex functions, such as simulating automata, Turing machines, and sparse circuits, by leveraging "chain-of-thought" computations. Our analysis provides bounds on the size, depth, and computational efficiency of ARDTs, highlighting their surprising computational power. Empirically, we train ARDTs on simple language generation tasks, showing that they can learn to generate coherent and grammatically correct text on par with a smaller Transformer model.

auto-regressive language modeling, machine learning, natural language, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.66)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.66)

Add feedback

Searching for Efficient Transformers for Language Modeling

Neural Information Processing SystemsOct-9-2024, 23:08:06 GMT

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.

efficient transformer, primer, transformer, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.80)

Add feedback

PIXAR: Auto-Regressive Language Modeling in Pixel Space

Tai, Yintao, Liao, Xiyang, Suglia, Alessandro, Vergari, Antonio

arXiv.org Artificial IntelligenceJan-6-2024

Recent works showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations and are implemented as encoder-decoder models that reconstruct masked image patches of rendered text. However, these pixel-based LLMs are limited to autoencoding tasks and cannot generate new text as images. As such, they cannot be used for open-answer or generative language tasks. In this work, we overcome this limitation and introduce PIXAR, the first pixel-based autoregressive LLM that does not rely on a pre-defined vocabulary for both input and output text. Consisting of only a decoder, PIXAR can answer free-form generative tasks while keeping the text representation learning performance on par with previous encoder-decoder models. Furthermore, we highlight the challenges to autoregressively generate non-blurred text as images and link this to the usual maximum likelihood objective. We propose a simple adversarial pretraining that significantly improves the readability and performance of PIXAR making it comparable to GPT2 on short text generation tasks. This paves the way to building open-vocabulary LLMs that are usable for free-form generative tasks and questions the necessity of the usual symbolic input representation -- text as tokens -- for these challenging tasks.

auto-regressive language modeling, image patch, ixar, (14 more...)

arXiv.org Artificial Intelligence

2401.03321

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Hot papers on arXiv from the past month: September 2021

AIHubOct-1-2021, 09:38:22 GMT

Comparing the visual quality of generated frames. From Diverse Generation from a Single Video Made Possible. Reproduced under a CC BY 4.0 license. Here are the most tweeted papers that were uploaded onto arXiv during September 2021. Results are powered by Arxiv Sanity Preserver. Abstract: Generative adversary network (GAN) generated high-realistic human faces have been used as profile images for fake social media accounts and are visually challenging to discern from real ones.

arxiv, submitted, video, (15 more...)

AIHub

Technology: